Spoken English Learner Corpora

نویسندگان

Olga Kolesnikova

Oscar-Arturo González-González

چکیده

In this paper we present a survey of some most significant spoken English learner corpora created up to date. Spoken learner corpora which include speech generated by learners are important in many areas of research and practice, in particular, for identifying typical pronunciation errors of learners of English as a second language (ESL), English as a foreign language (EFL), or English as a lingua franca (ELF). The data on common errors is helpful in designing more effective methods of pronunciation teaching as an aspect of language training. Also, error patterns can be implemented in intelligent tutor systems for English learning in order to design explanations and exercises in the error-preventive way and to generate a relevant feedback to the learner. The corpora we survey in this article include various types of English speech generated by learners with Arabic, Chinese, French, German, Greek, Japanese, Korean, Norwegian, Polish, Spanish, among others, as their first language (L1). Some English learner corpora described here are created for a single L1, other corpora are compiled for various first languages. Also, learner corpora vary depending on what type of English they exhibit: ESL, EFL, ELF or their combinations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compiling a Corpus of Taiwanese Students' Spoken English

This paper reports the compilation of a corpus of Taiwanese students’ spoken English, which is one of the twenty subcorpora of the Louvain International Database of Spoken English Interlanguage (LINDSEI) (Gilquin et al., 2010). LINDSEI is one of the largest corpora of learner speech. The compilation process follows the design criteria of LINDSEI so as to ensure comparability across sub-corpora....

متن کامل

Hedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners

Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...

متن کامل

Phrase Structure Annotation and Parsing for Learner English

There has been almost no work on phrase structure annotation and parsing specially designed for learner English despite the fact that they are useful for representing the structural characteristics of learner English. To address this problem, in this paper, we first propose a phrase structure annotation scheme for learner English and annotate two different learner corpora using it. Second, we s...

متن کامل

Facilitating a description of intercultural conversations: the Hong Kong Corpus of Conversational English

The relative difficulty with which spoken corpora can be compiled by the researcher compared with written discourses, coupled with the time needed to fully transcribe spoken data, to say nothing of the additional expenses involved, inevitably has made large spoken corpora a far rarer entity than written corpora. And yet, if we are to further unravel the intricacies of spoken discourse, we need ...

متن کامل

Error Annotation for Corpus of Japanese Learner English

In this paper, we discuss how error annotation for learner corpora should be done by explaining the state of the art of error tagging schemes in learner corpus research. Several learner corpora, including the NICT JLE (Japanese Learner English) Corpus that we have compiled are annotated with error tagsets designed by categorizing “likely” errors implied from the existing canonical grammar rules...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Research in Computing Science

دوره 130 شماره

صفحات -

تاریخ انتشار 2016

Spoken English Learner Corpora

نویسندگان

چکیده

منابع مشابه

Compiling a Corpus of Taiwanese Students' Spoken English

Hedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners

Phrase Structure Annotation and Parsing for Learner English

Facilitating a description of intercultural conversations: the Hong Kong Corpus of Conversational English

Error Annotation for Corpus of Japanese Learner English

عنوان ژورنال:

اشتراک گذاری